Skip to content

[SPARK-28687][SQL] Support epoch, isoyear, milliseconds and microseconds at extract()#25408

Closed
MaxGekk wants to merge 21 commits intoapache:masterfrom
MaxGekk:extract-ext3
Closed

[SPARK-28687][SQL] Support epoch, isoyear, milliseconds and microseconds at extract()#25408
MaxGekk wants to merge 21 commits intoapache:masterfrom
MaxGekk:extract-ext3

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Aug 11, 2019

What changes were proposed in this pull request?

In the PR, I propose new expressions Epoch, IsoYear, Milliseconds and Microseconds, and support additional parameters of extract() for feature parity with PostgreSQL (https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT):

  1. epoch - the number of seconds since 1970-01-01 00:00:00 local time in microsecond precision.
  2. isoyear - the ISO 8601 week-numbering year that the date falls in. Each ISO 8601 week-numbering year begins with the Monday of the week containing the 4th of January.
  3. milliseconds - the seconds field including fractional parts multiplied by 1,000.
  4. microseconds - the seconds field including fractional parts multiplied by 1,000,000.

Here are examples:

spark-sql> SELECT EXTRACT(EPOCH FROM TIMESTAMP '2019-08-11 19:07:30.123456');
1565550450.123456
spark-sql> SELECT EXTRACT(ISOYEAR FROM DATE '2006-01-01');
2005
spark-sql> SELECT EXTRACT(MILLISECONDS FROM TIMESTAMP '2019-08-11 19:07:30.123456');
30123.456
spark-sql> SELECT EXTRACT(MICROSECONDS FROM TIMESTAMP '2019-08-11 19:07:30.123456');
30123456

How was this patch tested?

Added new tests to DateExpressionsSuite, and uncommented existing tests in extract.sql and pgSQL/date.sql.

@SparkQA
Copy link

SparkQA commented Aug 11, 2019

Test build #108930 has finished for PR 25408 at commit d16a6ed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 11, 2019

Test build #108936 has finished for PR 25408 at commit f152004.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

Test build #108954 has finished for PR 25408 at commit f025e6e.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR's milliseconds implementation is inconsistent from PostgreSQL. It must be due to a simple misunderstanding on the contracts.

PostgreSQL

postgres=# select extract(milliseconds from timestamp '2011-05-06 07:08:09.1234567');
 date_part
-----------
  9123.457
(1 row)

This PR

spark-sql> select extract(milliseconds from timestamp '2011-05-06 07:08:09.1234567');
9123

Could you fix that and double-check all the other function implementations once again by comparing the full sub-second parts? Also, the document should be updated together to avoid this kind of confusions at user-sides.

In addition, PostgreSQL accepts more patterns like MSEC. Since this is for PostgreSQL Feature Parity, we had better support all patterns.

postgres=# SELECT EXTRACT(MSEC FROM TIMESTAMP   '2011-05-06 07:08:09.1234567');
 date_part
-----------
  9123.457
(1 row)

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

Test build #108956 has finished for PR 25408 at commit c0dea7b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Aug 12, 2019

@dongjoon-hyun Just in case, you checked not only behavior of extract but parsing string to timestamp as well. I mean rounding .1234567 to .123456. After the commit 3c848e7 :

spark-sql> select extract(milliseconds from timestamp '2011-05-06 07:08:09.1234567');
9123.456

@dongjoon-hyun
Copy link
Member

Thanks, @MaxGekk . BTW, please don't forget the other part like MSECsupport in my comment.

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

Test build #108990 has finished for PR 25408 at commit 3c848e7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 13, 2019

Test build #109017 has finished for PR 25408 at commit a317a64.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Aug 13, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Aug 13, 2019

Test build #109024 has finished for PR 25408 at commit a317a64.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Aug 13, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Aug 13, 2019

Test build #109033 has finished for PR 25408 at commit a317a64.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Minute(expression(ctx.source))
case "SECOND" =>
Second(expression(ctx.source))
case "MILLISECONDS" | "MSEC" =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, @MaxGekk . Did you check PostgreSQL to find out all supported variants?
Please check the variants really instead of relying on my comments. 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me be clear. I didn't check PostgreSQL full variants, but, at least, MS and MSECS are missing here.

cc @maropu . Could you help @MaxGekk to complete this feature?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check PostgreSQL to find out all supported variants?

I checked PostgreSQL documentation and JIRA ticket that mention milliseconds only. As far as I see timestamp.sql uses msec / msec as well.

at least, MS and MSECS are missing here.

Do we need to support them if they are not used in tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the variants really instead of relying on my comments. 😅

I relied on existing test

-- date_part( 'quarter', d1) AS quarter, date_part( 'msec', d1) AS msec,
-- date_part( 'usec', d1) AS usec
not on your comment, sorry :-E

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk
Copy link
Member Author

MaxGekk commented Aug 13, 2019

@dongjoon-hyun It seems there are many synonyms for the same units in PostgreSQL: https://github.com/postgres/postgres/blob/64579be64a3811d08ea7ccf92b1a443d76b96412/src/backend/utils/adt/datetime.c#L171-L234 . Are you sure we must support all of them?

@SparkQA
Copy link

SparkQA commented Aug 13, 2019

Test build #109051 has finished for PR 25408 at commit 7fcb6a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Could you rebase this to the master please, @MaxGekk ?

@SparkQA
Copy link

SparkQA commented Aug 14, 2019

Test build #109091 has finished for PR 25408 at commit 153d2ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @MaxGekk , @maropu , @HyukjinKwon .
Merged to master!

@MaxGekk MaxGekk deleted the extract-ext3 branch October 5, 2019 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments